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ABSTRACT 

Paddy being the most stable food crop of South India , has been facing a high demand for supply. There are 
several stress conditions faced by the paddy cultivators which may include salinity , droughty floods, harsh climatic 
changes, etc. Withstanding these parameters are the potential property of the Paddy variant, which is attributed to its 
genetic makeup. Some of the genes responsible for making the plant stress tolerant include LEA, DREB, OsDHODHl 
etc. and are considered for the current study. The work is an extension of our previous study on extraction of these genes 
and sequencing. The sequences of these genes are analyzed both structurally and functionally, so as to implement the 
crop improvement strategies, if required. The three genes are also subjected to the conservation study of their genetic and 
protein sequences. The results of the comparative studies revealed that the three genes do not share any sequence 
similarity or conservation among their genes and protein sequences is absent. The protein network study conducted, failed 
to detect any common legends or proteins being interacted in their metabolic pathways. This study can conclude that at 
the sequence level of these three genes do not share any similarity or conservation in spite of their functional similarity in 
stabilizing the plant under stress conditions. 
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INTRODUCTION 

Oryza Sativa has been the most commonly used plant species for the laboratory investigations. This is the 
most important crop of south India and highest consumed food. In spite of its importance and high demand, there 
are certain parameters that are limiting the crop yield annually. It is desirable on the part of a molecular biologist 
and the agriculturist to act against this stress related parameter and overcome the shortage of the food supply to the 
population. 

Apart from the number of Research protocols undertaken in the Molecular Biology, the implementation of 
Insilico strategies for management of high yield and development of quality products has made its platform. There 
are several insilico tools and software that enable the user to deal in detail with the genetic and proteomic data of 
the study sample. The computational protocols can add on to the laboratory research, which might be several times 
much detailed and clearer as compared to the molecular techniques. 

The three genes involved in imparting the drought and stress tolerance of the Paddy crops include LEA, 
DREB and OsDHODHl. All the three genes are known for their similar effect of plant stabilization from the stress. 
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In view of this point the current work aimed to perform a comparative study at the genetic level of these three genes and 
their evolutionary relationship established which revealed a very diverse genetic makeup of these genes. 

In the current study, an analysis on Sequence, Structure, Network and phytogeny of the three genes was 
performed and the conclusions were drawn based on their degree of similarity with respect to the plant resistance. 

MATERIALS AND METHODS 

Sequence Retrieval from NCBI 

NCBI is a public library or database containing information related to the proteins, genes, SNP, Domains, Primers 
etc. It is one of three primary databases existing and is redundant in nature. NCBI’ s Gene and Protein Databases have been 
used to retrieve the sequences of all the three Genes and their respective proteins. The FASTA format of the sequence 
along with some important information like accession number, functional regions, active sites etc have been noted. 

Conservation and Phylogenetic Study Using CLUSTAL W of EMBL 

In order to detect the sequence conservation at both genetic and proteomic levels, all the three genes and 
respective protein sequences of the three genes have been compared using the CLUSTALW tool from EMBL. The 
conservation studies were followed by the evolutionary tree development. The tool identifies the degree of similarity, gaps 
and the conservation present among the three sequences. Based on the results of the multiple sequence alignment, the 
phylogenetic tree can be constructed. 

Codon Plot 

This tool is available at bioinformatics.org and is used to detect the exact frame from where the coding of the gene 
starts. This information is helpful in detecting the important codons that may be involved in binding site or SNP or any 
other function. The codon plot is also helpful for the detection of SNP effect in the protein translation. The triplet codon 
pattern of the three gene sequences is performed and the results are recorded. The codon plot is available at the SMS suit of 
Bioinformatics.org. The codon plot was performed for all the three gene sequences and their coding pattern has been 
recorded. 

SMART Domain Analysis 

SMART is a protein analysis tool that is used to identify the Domains i.e. the functional regions present in the 
user entered sequence. It detects the domains based on an inbuilt database of protein domains known. The user entered 
sequence is compared to all the sequences from the inbuilt database and the identical ones were recognized whose domain 
information is provided. The domain information includes the total number and types of domains, their length, location and 
function. Further the regions of low complexity are also highlighted in the results if any. SMART analysis was performed 
for all the three proteins and the results were recorded. 

STRUCTURAL STUDIES OF THE PROTEIN 

Primary Structure Analysis uses Protparam 

Protparam is a tool from Expasy, which is used for the complete physic chemical characterization of the protein. 
This tool can be considered under primary structure prediction tools. The tool can predict the sequence based properties of 
the protein like the Length, Molecular Weight, Hydropathicity, Stability Index, Isoelectric Point and Half-life. These 
parameters intern help in the development of experimental protocol. 
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Secondary Structure Analysis uses SOPMA 

For the identification of the internal structural conformations like loop, coil, turn, etc. Secondary structure has to 
be analyzed. SOPMA is one such tool used to analyse the conformational folds within the protein structure. Along with the 
structural conformations a graphical representation of the summary is also provided. 

Tertiary Structure Prediction Using PHYRE 

Phyre is a 3D structure prediction tool that runs on a BLAST Like algorithm that can compare the user entered 
sequence with the list of protein sequences stored in the PDB database and the result is displayed as the best PDB Id’ s that 
share maximum sequence similarity. The results here are the PDB ID’S of the structures that are already stored in a Protein 
data bank. One can visualize the structure by downloading this PDB from the protein data bank. 

RASMOL Visualization of the PDB Structures 

Rasmol is special visualization software that is required to view the 3D structure of the protein. This is a 
command line program that can be used to dictate the 3D structure of the protein. This software can be used to highlight 
the regions of interest, show the legend and water molecules present in the protein structure and also focus the ligands 
bound to the protein structure. 

RESULTS AND DISCUSSIONS 

Protein and Gene Sequence Retrieval 

The protein sequences of all the three proteins have been retrieved and the sequence characters were analyzed. 
The length of the protein sequence of all the three genes LEA, DREB and OSDHODH1 are 200aa, 298aa and 414aa 
respectively. The sequences were further analyzed with respect to their function, structure, etc. In additional to the retrieval 
of protein sequence the gene sequences were also retrieved from the same data base. The gene sequences were 603bp, 
897bp and 1245bp in length. 

Comparison of the Gene and Protein Sequences 

All the three genes and their corresponding protein sequences were compared wrt the sequence similarity using 
Clustal W programme, their evolutionary relation is measured using the phylogenetic tree constructed from Clustal W 
programme. The results are shown below. 


i— DREB2B 


. 1 — LEA 

1 putative 

Figure 1: The Phylogenetic Tree of the Three Genes LEA, DREB and OsDHODHl 

The above three indicates a closer relatedness of DREB with LEA which can be considered as Parallouges 
evolving partially were as the OSDHODH1 (Putative) shows a different evolutionary branch which indicates no similarity. 
The three sequences do not share any sequence level similarity either at the genetic level or at the protein stage. This can 
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be clearly identified in their CLUSTAL W Multiple sequence alignment. 

Codon Plot 

The triplet coding pattern of the three genes has been analyzed to detect the exact frame by which the translation 
can be performed insilico. The results of the same indicate that all the three genes are coding from its first base i,e the 
coding pattern starts from the beginning of the sequences. The results are shown below. 


Codon Plot results 

Results for 603 residue sequence "LEA" starting "ATGGCTTCCC” 

atg, 1 to 3- (Met> 

XXKXXXXXXXXXXX2CXXXXXK2QQOQOCXXXXXXXXX3QCXXX2Q CXXXXXXXXXX3C2QCXXXXXXXXXXXXX5CKXX3CKXXXX3 

get, 4 to 6 


too, 7 to 9 {Ser) 

O - Jl_ 3 


oslo, 3 0 -to 3 2 £Ois) 

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX O . 43 


ca.g, 13 to 15 (Gin) 


gsc , 16 to 1 3 (Asp ) 

, 3 9 to 23 . (Gin) 

get, 22 to 24 (Ala) 

XXXXXXXXXXXXXXXXX O . 3 *7 


Figure 2a: Showing the Codon Plot Result for LEA 


O o don F=* I o t results 

Results tor S97 r residue sequence "DREB" starting " A3CSAAGGGGA" 

-• try , 3 t . c > :3 CMest ) 

' * - ■ ■ 4 to & £tys > 


ggg, v to & testy > 



■LJ .1 »J| , 22 to 2 4 (Glia) 

X3CXX3CX3CXXCXX3CX3CXX1CX3QQCXXX3CXXCXX5CXX: O _ 3- S 


. a .> t . , 2 S to 2 V (Asn) 


Figure 2b: Showing the Codon Plot Result for DREB 


Codon F=*lot results 

Results -for 3 24 5 residue sequence 


starting "ATGGAGTCGC - 


to 3 C M. r t ) 


C>OC3C3CXIX=X3«3C>C5QCi«3CXZ>C5C>C>C5C3C>E3C>C5C3C5C3C3C543C5C5C>CX3C5C5 


X^XX^XX30QC3C3C 0.30 


to 2 j 


>-T 3-T3-7 3-T 3-T3-7 3-T3-7 3< XX >3 T 7-7 7-77-7 7-77-7 7-7 7-77-7 7-7 O . S S 


Figure 2c: Showing the Codon Plot Result for OsDHODHl 
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The domain analysis of the three genes was performed to detect the functional units within the three protein 
sequences. Also a comparative study was performed to detect any common domain in the three. 

Figure 3a: The domain Analysis of LEA 

The above result shows that there is no individual domain detected in the LEA gene. However, two regions of 
Low complexity have been detected at 104bp and 186bp location. 


43 


Figure 3b: Domain Regions of DREB 

As shown in the above figure, the only domain seen in DREB gene is AP2 which is located from 13 to 76 
locations in the protein. This is the DNA binding domain present on the plant species. This is involved in transcription 
regulation process. 

The domain analysis of OsDHODH shows that the data is totally not available pertaining to this gene. 

Structural Analysis of Proteins 
Protparam Analysis for the Three Proteins 

In order to study the physicochemical characteristics of the three proteins Protparam has been employed. The tool 
details certain information like the length, stability, half-life, hydropathicity etc of the protein, which can be later used for 
the development of experimental protocol. The protparam results of the three proteins have been depicted below. 

ILEA Protein: 

Moiiib-er o fT ami no acids : 200 

Molecular weight : 20512 . lO 

Theo re 1 1 cal p-I : 3.33 

Total number of n-e^at i ve ly cliar-ged rosldaos (Asp* + Gin > : 3 3 
To tal nnmb-e r of positively char-ged re s 1 due s (Arg + Lys > : 3 0 


This protei xx does not contain any Tip r e a idue a: . Expe ri ence shows that 

thi 3 could result in mo r e than lOit error in the caiuputed e xt inct i an 

c o e :E :E i c i ent . 

Extinction coefficients are in nnits o f *- -dm -1 r at 230 run me a sn r ed in 


Ext . c o e f £ 3. c a. ent 2930 

Afis O . (=1 g/1) 0.143 

Estimated half-life : 

The U — terminal of the seeguence considered is M (1-let } . 

The estimated half — life is: 30 hour s ( mamma 1 i an reticulocytes^ in -w-itra} 

>-2 0 hour s ( yeast r in 'riva > . 

^lO hours { Escherichia coll, in ^i^o } . 


Ins tah-r 1 1 ty i nder : 

The instability index (IIJ is compauted to toe 23.10 
This classifies the parotein as stable . 

Aliphatic index: 33.30 

Grand ave rage of hydropa thi c i ty (GHAVYJ : — 1 . 065 

Figure 4a: Physicochemical Characteristics of LEA Protein 
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Inference 

From the above results, it can be inferred that the LEA protein has a length of 200 amino acids with the molecular 
weight to be 20512.10Kd. The protein is basic in nature with more positive amino acids. It is a stable protein with 
instability index 25.10 and is hydrophilic in nature. 


DETB PROTEIN ; 

Number of amino acids : 29fl 
Molecular ’wei.glit : 3215S.39 

Theoretical g_X : 4.2S 

Total murmfci-e i~ of Bnu&g-a.t A’v&ly charged residues {Asp Gin) : 52 

Total mimbe t of positively charged residues { Arq Ly s ) : S3 

Extinction coef fldents : 

Extinction coefficients are In. iinita of ~ in - „ a.-c 2SO run. measureci in 

water _ 


Ext . coefficient 3669Q 

Abtzi.3 D_3L% (=1 g/1) 1. 141, a. s Siam i ng all pairs of Oya re3iduea ! form cjgttines 


E.3C1C. . coefficient 3644Q 

D _ 1 1 (=1 gV 1 > 1. 13 3, a. s girm i op all Cya residues are reduced 

Estimated half — life : 

Tire N— terminal of the sequenuce considered i_3- M (Met) _ 

Tire estimated half — life la 1 : 3 0 tours {mammalian reticulocyte a , in. vitro) . 

>20 hours (yeaat. In vivo) . 

>1Q hour s [ E s chericiiia ooli, in vivo) _ 


Instability index : 

The instability indes; i s computed to- be 51.22 

This classifies the ip:r-of -&:Lri as unstable . 

Aliphatic index: G4 . 93 

Grand ave Tr-au-e of hy dropathici ty { CTRjAVTmT > : — Q . 52 5 

Figure 4b: Physicochemical Characteristics of DREB Protein 


Inference 


From the above result of protparam it can be concluded that DREB gene is 298aa in length with a corresponding 
molecular weight of 32158.39 kd. The isoelectric point of the protein was found to be 4.28 and the stability index was 
51.22 indicating that the protein is unstable. It is a basic protein, which is polar in nature. 


OsDHODHl PROTEIX: 


Number of amino acids : did 

Molecular weight : 45303.90 

Theoret.i ea 1 j>l : 6.29 


Tota.1 number of ueyeL t L vely changed xosldues (Asp + Glti ) : 51 

Total number of positively charged re 3iduc3 ( flrtj + Lya) : 4 3 


Extinction coef ficients are in iit i Its c ±r cm 1 , at 230 ran me a an at ed in 


Ext . coeffic 
fibs O _ 1% C = 1 

Ext . coeffic 
hba O _ 1% f = 1 

Estimated ha 


i ent 

g/l> 


g/l> 
if -life 


4 52 95 
1 . OOO, 

4 4 920 
O . 991, 


assuming a. 11 pairs 


a s suraing all C: y ts re s i due a arts reduced 


The N-tenninal -of the sequence considered is JM (Met> - 

The: -as t imatedl half-life ~i s : 30 hours ( mamma lian ret icul o cyte s , in 

t- 2 . 0 hours (yeast, in -vivo) _ 

>10 hours (Escherichia coli, in vivo) 

1 nstabi lity index : 

Tire instability index (II) is computed to he 42.04 

This ^classifies the protein as unstable . 


Aliphatic: index : 3 1 _ 52 

Grand average of hydropathiclty (GERAVT) : — 0.260 


vitro ) _ 


Figure 4c: Physicochemical Characteristics of OsDHODHl Protein 
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Inference 

From the above result of protparam for OsDHODHl, it can be inferred that the protein is 414 amino acids in 
length with a corresponding molecular weight of 45308.90 kd. The protein is a basic protein with the instability index 
found to be 42 making the protein unstable. The hydropathicity of the protein was found to be -0.260 making it a polar 
protein in nature. 

Secondary Structure Prediction of the Three Proteins using SOPMA 

SOPMA tool details the secondary structural conformations of the protein. The tool has been employed to detect 
the structural conformations of the three proteins. The results are depicted in the below figures. 


19 2.0 ae 5-e 

I I I I I I I 

MASHQD<iASVIW.GETKAHTe EN AfiQVMGASKDKAS EAKtJBASEAAjGWAAljKGQPTfc EATKOKAQAAKDRA. 
hbhtt * khhlthbhKbhhSh :: . H hHHh.fr HbHhHH HHH hh t HHH bhfri HHH hh hbhh 

S £ TAjQAAK SKOqTXsGF LG E KT EQAKQ*£AAE T A J&AAK QKTRETAQVTT KDSAIAGK 

t-ihb.*-ihhhh»lh InVi Hlhl-i HH hhh h,h xx- t H hh* hih Kh.li fr 

DKTGSV LQQA.SEQVK &TWGAK DAVMST EGJTT E □ EAGTDDGAMK DTSATAAAT ETT AftDM 
hhhhhhShhhhheeiS HhhbhlvHI-il. tt HhHh,hlibhHh.lhHt:t 


S tquenc* length ; 200 


SOPMA : 

Alpha helix (Mh) 

3ie (Cg) 

Pi helix cli j 

e^ta bridge 
Extended strand 
Beta turn (It) 

Eend 1 region fl? 51 

Random, COll fCc) 

Aenb iguous states 0 3 

Other s tat ei 


135 is 
0 is 
& is 
0 is 
5 is 
13 is 
0 is 
47 is 
e- is 


o, ew&ss 
o. ootk 
5 - 50* 

e. sow 
o. 00 s 
23 , 


Figure 5a: Secondary Structure of LEA 


la 2.0 30 ^-0 5-0 &0 20 
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M KG KGGP EHT RC D FRGVFtQRTWG KWA EIRE PNQQS R HvJ I CnT F PTAE AAACAYD EAARAMYS PMAR TN FG 
ctt CZ<Z<Z<Z<Z <=<=(= cz cze-dz-fc cz cz h hhhh hhh hhhc cz-tz-t ttceee e t<z -c<zc hhh hhh hhhh hhh hhhh hhh h cz cz cz cz 
QHHAPAASVQVA LAAVKCALPGGGLTAS KS RTSTQGASADVQDVLTGG LSACESTTTTI NMQSDWST LH 
<z cccchhhhhhhhhhhhhhctttcceecccccccttcchhhhhhhhtcccc cccceeeeccccceeeehc 
K RE E VS EI55 PL RAF* PAV LEDGSME DKAE5VTYD ENIVSQQRAP REA EASNGRG EE VF E R LEPIAS LP ED 
■ez cthhhhhcccccccc eeeettc cccchheeeechheeeccccc c h h h he c ccchhhhhhcchhhccclzt 
Q.GDYC FDID EML RHT-1 EADPTN EGLWKGD KDQ5 DAI L E L-QQD E R FWE0VDRi0M LDN L L R5D E PAVJL LAD R 
tizcce^ch l~i l~i h H HUH H ~t ~t ~t ttthe e etc ccz-ccz h ee-ee- h c-fc tcceee <z<zt |-i h h H h h h -cz -fc-fc-c: «r h e e e cz cz -tz 

AMF I S 00 F ED DSQF F E0 L 
■fc-e e e e t<z <z -cz cz cz cz hhhh h h 


5 e<z|Li e ric leng-^El"* = 


SOPFIA : 

Alpha Helix; (Hh) 

Helix (Qg> 

P 1 lit- 1 i x (XI) 

Beta bridge (Bb ) 

E x ten-d e d s-tr-and ( Ee) 
Bei.cS izuirn (It) 

Bend region (Ss) 

Random coil (Cc ) 


Ambiguous states } 

Other s-t adzes- 


4& is 


3 A- . 5&% 

0 . & 0 % 
• 0 . 00 % 

■ 0 . 00 % 

I 5 . ^R»% 

1 2 . 25% 

■ 0 . 00 % 
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0 . 003 & 
■ 0 . 00 % 


Figure 5b: Structural Conformations of DREB 
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Figure 5c: Secondary Structural Characterization of OsDHODHl 


Inference: All the above three proteins’ secondary structure results indicate the conformational changes of the 
amino acids within the protein. The detailed composition of helix, sheets and random coils has been provided. 


Tertiary Structure Prediction Using Phyre 

Phyre compare the user entered sequences with the list of sequences from the PDB data bank in order to obtain 
the most suitable structure matching the sequence entered. 


In addition to the results of the phyre tool displaying the best PDB id’ s the structures of the proteins can be studies 
using RASMOL visualization software. 


The results of the Phyre and PDB indicate that the structure data related to the three proteins is not available in the 
database. Hence Homology modelling can be employed to develop the structures of the three gene products. 


CONCLUSIONS 


In the current work, a detailed analysis of the three proteins LEA, DREB and OsDHODHl has been made 
insilico. The analysis was also used to compare the similarities if any existing among the three. The results show that the 
three sequences did not share any considerable similarity. The structural study was also performed to analyze the 
physicochemical characteristics of the proteins followed by the identification of their secondary structural conformations 
wherein all three proteins were found to be richer in helix than strands. The tertiary structure analysis revealed that the 
structures of the three proteins are not known and their prediction has to be made. The further modelling approach can be 
employed to develop the structures of the three proteins. 
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